35 research outputs found
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods, which have tackled this
problem in a deterministic or non-parametric way, we propose a novel approach
that models future frames in a probabilistic manner. Our probabilistic model
makes it possible for us to sample and synthesize many possible future frames
from a single input image. Future frame synthesis is challenging, as it
involves low- and high-level image and motion understanding. We propose a novel
network structure, namely a Cross Convolutional Network to aid in synthesizing
future frames; this network structure encodes image and motion information as
feature maps and convolutional kernels, respectively. In experiments, our model
performs well on synthetic data, such as 2D shapes and animated game sprites,
as well as on real-wold videos. We also show that our model can be applied to
tasks such as visual analogy-making, and present an analysis of the learned
network representations.Comment: The first two authors contributed equally to this wor
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image Denoising
Image denoising is a fundamental and challenging task in the field of
computer vision. Most supervised denoising methods learn to reconstruct clean
images from noisy inputs, which have intrinsic spectral bias and tend to
produce over-smoothed and blurry images. Recently, researchers have explored
diffusion models to generate high-frequency details in image restoration tasks,
but these models do not guarantee that the generated texture aligns with real
images, leading to undesirable artifacts. To address the trade-off between
visual appeal and fidelity of high-frequency details in denoising tasks, we
propose a novel approach called the Reconstruct-and-Generate Diffusion Model
(RnG). Our method leverages a reconstructive denoising network to recover the
majority of the underlying clean signal, which serves as the initial estimation
for subsequent steps to maintain fidelity. Additionally, it employs a diffusion
algorithm to generate residual high-frequency details, thereby enhancing visual
quality. We further introduce a two-stage training scheme to ensure effective
collaboration between the reconstructive and generative modules of RnG. To
reduce undesirable texture introduced by the diffusion model, we also propose
an adaptive step controller that regulates the number of inverse steps applied
by the diffusion model, allowing control over the level of high-frequency
details added to each patch as well as saving the inference computational cost.
Through our proposed RnG, we achieve a better balance between perception and
distortion. We conducted extensive experiments on both synthetic and real
denoising datasets, validating the superiority of the proposed approach
A computational approach for obstruction-free photography
We present a unified computational approach for taking photos through reflecting or occluding elements such as windows and fences. Rather than capturing a single image, we instruct the user to take a short image sequence while slightly moving the camera. Differences that often exist in the relative position of the background and the obstructing elements from the camera allow us to separate them based on their motions, and to recover the desired background scene as if the visual obstructions were not there. We show results on controlled experiments and many real and practical scenarios, including shooting through reflections, fences, and raindrop-covered windows.Shell ResearchUnited States. Office of Naval Research (Navy Fund 6923196
Using Image-Processing Settings to Determine an Optimal Operating Point for Object Detection on Imaging Devices
This publication describes techniques and processes for using image-processing settings (e.g., Auto-Exposure (AE), Auto-Focus (AF), and/or Auto-White Balance (AWB)) to determine an optimal operating point for object detection by an object detector on an imaging device. An operating point is provided to the object detector by a manufacturer to enable the object detector to execute object detection. Through object detection, the object detector determines if an object is identified in the scene based on a confidence score. The optimal operating point has a computed image-processing setting that is closest to an ideal value of the image-processing setting. In an example, a fixed penalty function allows an optimal operating point to be determined using computed AE results for the image at different operating points compared to an ideal AE for the image. The smallest difference between the computed AEs and ideal AE corresponds to the optimal operating point for the image. The process can be repeated for many images to determine an optimal operating point across many types of images. Additionally, the process can be conducted with other image-processing settings, such as AF and AWB, to guide the selection of an optimal operating point across many settings. The determined optimal operating point can be provided to an object detector on an imaging device to provide a positive user experience with the imaging device
MoSculp: Interactive Visualization of Shape and Time
We present a system that allows users to visualize complex human motion via
3D motion sculptures---a representation that conveys the 3D structure swept by
a human body as it moves through space. Given an input video, our system
computes the motion sculptures and provides a user interface for rendering it
in different styles, including the options to insert the sculpture back into
the original video, render it in a synthetic scene or physically print it.
To provide this end-to-end workflow, we introduce an algorithm that estimates
that human's 3D geometry over time from a set of 2D images and develop a
3D-aware image-based rendering approach that embeds the sculpture back into the
scene. By automating the process, our system takes motion sculpture creation
out of the realm of professional artists, and makes it applicable to a wide
range of existing video material.
By providing viewers with 3D information, motion sculptures reveal space-time
motion information that is difficult to perceive with the naked eye, and allow
viewers to interpret how different parts of the object interact over time. We
validate the effectiveness of this approach with user studies, finding that our
motion sculpture visualizations are significantly more informative about motion
than existing stroboscopic and space-time visualization methods.Comment: UIST 2018. Project page: http://mosculp.csail.mit.edu